Data on cloning, expression and biochemical characteristics of a chondroitin sulfate/dermatan sulfate 4-O-endosulfatase

The data shown in this article are related to the published paper entitled “A novel 4-O-endosulfatase with high potential for the structure-function studies of chondroitin sulfate/dermatan sulfate” in Carbohydrate Polymers. In this article, the phylogenetic analysis, cloning, expression, purification, specificity and biochemical characteristics of the identified chondroitin sulfate/dermatan sulfate 4-O-endosulfatase (endoBI4SF) are described in detail. The recombinant endoBI4SF with a molecular mass of 59.13 kDa can can specifically hydrolyze the 4-O- but not 2-O- and 6-O-sulfate groups in the oligo-/polysaccharides of chondroitin sulfate/dermatan sulfate and show the maximum reaction rate in 50 mM Tris-HCl buffer (pH 7.0) at 50°C, which can be a very useful tool for the structural and functional studies of chondroitin sulfate/dermatan sulfate.


a b s t r a c t
The data shown in this article are related to the published paper entitled "A novel 4-O -endosulfatase with high potential for the structure-function studies of chondroitin sulfate/dermatan sulfate" in Carbohydrate Polymers . In this article, the phylogenetic analysis, cloning, expression, purification, specificity and biochemical characteristics of the iden-  Table   Subject Biochemistry and Molecular Biology Specific subject area Phylogenetic analysis, cloning, expression, purification, specificity analysis and biochemical characteristics of the chondroitin sulfate/dermatan sulfate 4-O -endosulfatase endoBI4SF. Type of data Table  Graph Figure How the data were acquired The protein sequence (GenBank TM accession number: EDV06292.1) of endoBI4SF was downloaded from NCBI database. Other data were acquired following the "Experimental Design, Materials, and Methods " described in this article. Data format Raw Analyzed Filtered Description of data collection Sequence alignment and phylogenetic analysis were performed using Bio-Edit version 7.0.5.3 and MEGA version 7.0. The full-length gene of endoBI4SF without predicted signal peptide was cloned into the expression vector pET22b ( + ). EndoBI4SF was induced and expressed in E. coli BL21 (DE3), and purified with a Nickel-Sepharose TM 6 Fast Flow nickel affinity column. The substrate specificity of endoBI4SF was determined using four unsaturated chondroitin sulfate disaccharides with different sulfation patterns. The optimal temperature, optimal pH, effects of metal ions/other chemicals (5 mM

Value of the Data
• The data provide the detailed supplementary materials for the paper entitled "A novel 4-O -endosulfatase with high potential for the structure-function studies of chondroitin sulfate/dermatan sulfate" published in the journal of "Carbohydrate Polymers ". • The data benefit the researchers working in the CS/DS sulfatase-related fields.
• The data provide the preparation method and biochemical parameters of the endoBI4SF for its further study and application.

Introduction
The data is supplementary of the paper entitled "A novel 4-O -endosulfatase with high potential for the structure-function studies of chondroitin sulfate/dermatan sulfate" in Carbohydrate Polymers [1] , describes the specific activity, biochemical characteristics of endoBI4SF, and the methods of cloning, expression, purification and specificity analysis.

Gene and Protein Sequences of endoBI4SF
The endoBI4SF gene (GenBank accession number: EDV06292.1) is 1,548 bp in length with 45.28% GC content and encodes a protein comprising 515 amino acid residues with a 20 amino acid type II N-terminal signal peptide (Met 1 -Gly 20 ). The endoBI4SF protein has a predicted molecular weight of 59.13 kDa and a theoretical isoelectric point of 5.51. According to BLASTp multiple sequence alignment, endoBI4SF is annotated as an arylsulfatase. Furthermore, phylogenetic analysis showed that endoBI4SF clustered with D-N -acetylgalactosamine (GalNAc) -4-Osulfatases, and thus, it was preliminarily projected to be a GalNAc-4-O -endosulfatase ( Fig. 1 ). All of the sequence constructed phylogenetic trees are listed in the dataset files in the repository [2] .
Phylogenetic analysis of endoBI4SF was executed based on Clustal W multiple alignments with identified CS/DS sulfatases from bacteria. The phylogenetic tree was constructed by MEGA version 7.0.26 with the neighbor-joining statistical method. The percentage of replicate trees, in which the associated taxa clustered together in the bootstrap test (1,0 0 0 replicates), is shown next to the branches.

Heterologous Expression and Purification of endoBI4SF
The gene of the endoBI4SF without signal peptide sequence was amplified from the genomic DNA of Bacteroides intestinalis DSM 17393. The putative sulfatase fragment was cloned into a pET22b expression vector with a C-terminal His 6 tag and then transferred into E. coli BL21 (DE3) cells. Cells harboring the expression vector (pET22b-endoBI4SF) were cultured and induced as described in "Materials and methods ". According to the SDS-PAGE, the putative endoBI4SF was successfully expressed and formed a 59 kDa soluble protein in the supernatant ( Fig. 2 ), and the protein was approximately 90% pure after Ni 2 + affinity chromatography purification. The expression and purification data of the recombinant protein are shown in Table 1 .
Expression and purification of the recombinant endoBI4SF were presented by SDS-PAGE using 13.2% polyacrylamide gels followed by staining with coomassie brilliant blue. 1, prestained protein marker (MP102) (Vazyme); 2, lysate transfected with empty pET22b plasmid; 3, lysate transfected with pET22b-endoBI4SF expression plasmid and induced with IPTG; 4, lysate supernatant transfected with pET22b-endoBI4SF expression plasmid and induced with IPTG; 5, purified re-    4,5 HexUA and GalNAc represent a double bond between C4 and C5, hexuronic acid and D-N -acetylgalactosamine, respective, were used as substrates to preliminarily determine the specificity of endoBI4SF. EndoBI4SF was capable of hydrolyzing the 4-O -sulfate groups from A and E to produce O ( 4,5 HexUA1-3GalNAc) and C, respectively, but did not affect the 6-Osulfate and 2-O -sulfate groups in C and D ( Fig. 3 ), the putative sulfatase is classifieds as a GalNAc-4-O -sulfatase. The raw data used for preparing the figures has been stored in a dataset file of Mendeley Data [2] .  β1-3GalNAc(4S,6S)).

Biochemical Characteristics of endoBI4SF
The biochemical characteristics of endoBI4SF, including the optimal temperature, optimal pH, effect of metal ions or other reagents and thermostability, were estimated using A as a substrate, and the raw data could be found in a dataset file of Mendeley Data [2] . As shown in Fig. 4 , endoBI4SF showed the maximum reaction rate at 50 °C but the activity was sharply reduced by increasing the temperature to 60 °C ( Fig. 4 A). EndoBI4SF showed relatively high activity at pH 5.0-8.0, had its maximum reaction rate in 50 mM Tris-HCl buffer (pH 7.0) and it also retained a high activity in 50 mM NaAc-HAc buffer (pH 6.0) ( Fig. 4 B). Li + and Ca 2 + slightly enhanced the enzyme activity. Additionally, the univalent metal ion Ag + and most divalent metal ion, such as Hg 2 + , Pb 2 + , Cu 2 + , Zn 2 + , and another metal ion like Fe 3 + and Cr 3 + strongly inhibited the activity of endoBI4SF. The reducing reagent DTT (5 mM) promoted the activity of endoBI4SF, but the chelating reagent EDTA showed no significant effect ( Fig. 4 C).
The thermostability of endoBI4SF was investigated by measuring the residual activity at the optimal conditions (50 mM Tris-HCl, pH 7.0) at 50 °C after preincubating of the enzyme at a series of temperatures for 0-24 h. The enzyme activities maintain greater than 60% of its enzymatic activity even after incubation at 40 °C for 24 hours and quickly decreased within 3 hours after increasing temperature to 50 °C ( Fig. 4 D).
The enzymatic kinetics was also calculated using A as substrate under the optimal conditions by fitting the Michaelis-Menten equation with origin version 9.6. The V max and K m of endoBI4SF were 26.87 ±2.22 (U/mg) and 5.94 ±1.43 (mM), respectively ( Fig. 5 ). The raw data for preparing the figures can be found in a dataset file of Mendeley Data [2] .
The optimal conditions, including effects of temperature (A), effects of pH (B), effects of metal ions, chelating reagent and reducing reagent (C) and thermostability (D) of endoBI4SF were determined using A (1 mg/ml) as substrate. Error bars represent averages of duplicates ± S.D.
Reactions were performed triplicate using various concentration of A with endoBI4SF (3 μg) in 50 mM Tris-HCl buffer (pH 7.0) at 50 °C. The Michaelis-Menten equation was fitted using origin version 9.6.

Materials
PrimeSTAR TM HS DNA polymerases for PCR amplification were purchased from Takara Inc.

Sequence Analysis of Gene and Protein of endoBI4SF
The protein sequence (GenBank TM accession number: EDV06292.1) of endoBI4SF was downloaded from NCBI database. Online similarity analysis was carried out using the Protein BLAST online ( https://blast.ncbi.nlm.nih.gov/Blast.cgi ). The signal peptide prediction was performed by SignalP 5.0 server ( https://services.healthtech.dtu.dk/service.php?SignalP-5.0 ). Sequence align-

Heterologous Expression and Purification of Recombinant endoBI4SF, rVAR2 and rVAR2-stGFP
The full-length gene of endoBI4SF without signal peptide sequence was amplified using highfidelity PrimeSTAR TM HS DNA polymerases (Takara, Dalian) and corresponding primer pairs with restriction enzyme sites ( Table 2 ). PCR products were inserted into the NdeI/XhoI restriction enzyme sites of expression vector pET22b ( + ). Fluorescent probe rVAR2-stGFP, rVAR2 fused with stGFP on 3'-terminal, was used to specifically recognize 4-O -sulfated CS-A. The sequence with homologous recombination regions of stGFP was amplified and recombine with Xho I treated pET22b-rVAR2 using Fast Mutagenesis Kit V2 (Vazyme, Nanjing). The recombinant plasmid pET22b-endoBI4SF, the synthetized plasmid pET22b-rVAR2, and the fusion plasmid pET22b-rVAR2-stGFP were transformed into E. coli BL21 (DE3), respectively. The fragment integrities were confirmed by DNA sequencing. To express endoBI4SF, rVAR2 and rVAR2-stGFP, E. coli cells harboring respective expression vector were expanded in LB broth until the A 600 values reached 0.8-1.0 at 37 °C, and then induced and expressed at 16 °C by supplementing with a final concentration of 0.05 mM Isopropyl 1-thio-β-D-galactopyranosid. After 24 h, the cells were collected by centrifugation at 8,0 0 0 × g, suspended with buffer A (50 mM Tris-HCl, 150 mM NaCl (pH 8.0)) and then disrupted by sonication (40 repetitions, interval 4 s stop 8 s) in an ice-cold environment. Cell lysate was separated by centrifugation at 15,0 0 0 × g for 30 min at 4 °C. The supernatant containing recombination protein were loaded on a nickel affinity column with Nickel-Sepharose TM 6 Fast Flow (GE Healthcare, Sweden), washed with buffer A containing 10 mM imidazole to remove impurities, and then eluted with buffer A containing 250 mM imidazole to collect the target proteins. After desalting with an Amicon Ultra 0.5-ml 10K unit (Millipore) to remove the high concentration of salt, the purified proteins were exchanged to PBS buffer and analyzed by SDS-polyacrylamide gel electrophoresis (SDS-PAGE) followed by staining with Coomassie Brilliant Blue R-250. The concentration of purified protein was determined using BCA Protein Assay Kit (Cwbio, Shanghai).
Restriction enzyme sites are underlined. Amp r , ampicillin-resistant.

Substrate Specificity of endoBI4SF Toward CS Disaccharides
To determine the substrate specificity of endoBI4SF, four unsaturated CS disaccharide substrates (2 nmol) with different sulfation patterns ( A, C, D and E) were incubated with endoBI4SF (2 μg) in 50 mM Tris-HCl buffer (pH 7.0) at 30 °C overnight. After inactivation at 85 °C for 10 min, cooled in ice-cold water and centrifugation at 15,0 0 0 × g for 10 min, the supernatants of endoBI4SF-treated CS disaccharides were collected and labeled with 2-AB [4] . After extraction with chloroform to remove the free 2-AB, all the labeled samples were detected via anion exchange HPLC with a YMC-Pack PA-G column (YMC, Japan) and eluted with a linear gradient from 16 to 460 mM NaH 2 PO 4 over 60 min at a flow rate of 1.0 ml/min at room temperature. The products were monitored with a fluorescent detector with excitation and emission wavelengths of 330 and 420 nm, respectively.

Biochemical Characterization of Recombinant endoBI4SF
To determine the optimal temperature of endoBI4SF, the effect of temperature toward en-doBI4SF (about 0.2 μg) were tested with 30 μg A (1 mg/ml) in 50 mM Tris-HCl (pH 7.0) at temperatures from 0 to 70 °C for 5 min. The effects of pH were determined using a series of buffers with different pH ranges, including NaAc-HAc buffer (50 mM, pH 5.0-6.0), NaH 2 PO 4 -Na 2 HPO 4 buffer (50 mM, pH 6.0-8.0), and 50 mM Tris-HCl buffer (pH 7.0-10.0) in a total volume of 30 μl at optimal temperature for 5 min. The effects of metal ions/other chemicals (5 mM) toward endoBI4SF were investigated by incubating with A and enzyme at the optimal temperature and pH as described above. To determine the thermostability of endoBI4SF, purified enzyme was pre-incubated at temperature from 0 to 60 °C for 0-24 hours, respectively, and the residual activity of enzyme was calculated by incubating the reaction mixture for 5 min in optimal condition. All reactions were carried out duplicate. The enzymatic activities were determined by gel filtration HPLC using a Superdex TM Peptide 10/300 GL column with 0.20 M NH 4 HCO 3 as mobile phase at a flow rate of 0.4 ml/min. The absorbance at 232 nm was monitored using a UV detector and analyzed online by using the software LCsolution version 1.25 [ 5 , 6 ]. In addition, the reaction rate of endoBI4SF against A at the final concentrations of 0-25 mM was measured under the optimal conditions to analyze the enzyme kinetics. The kinetic parameters were calculated based on the Michaelis-Menten equation fitting with origin version 9.6.

Ethics Statements
These data does not involved in human subjects, animal experiments, or data collected from social media platforms.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.